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Abstract 

Approximate- Tree-By-Example  (ATBE)  is  a  system  designed  to  support  constructing,  comparing  and  querying 
sets  of  ordered  labeled  trees.    In  such  trees,  the  nodes  are  labeled  and  the  left-to-right  order  among  siblings 
is  significant.  Ordered  labeled  trees  have  many  applications  in  vision,  pattern  recognition,  molecular  biology, 
programming  compilation  and  natural  language  processing. 
The  system  supports  two  query  paradigms: 

1.  A  pattern  tree  (that  may  contain  variables  and  don't  care  symbols)  is  matched  against  a  set  of  data 
trees  (containing  neither  variables  nor  don't  care  symbols).  This  is  a  generalization  of  (a)  approximate 
regular  expression  matching  in  strings  and  (b)  of  exact  tree  matching  with  variables. 

2.  Given  two  sets  of  data  trees  S  and  T,  find  (a)  the  pair  of  trees  s  from  S  and  t  from  T  that  are  closest 
(or  farthest)  with  respect  to  a  user-definable  editing  distance  metric,  and  (b)  pairs  of  trees  from  the  two 
sets  that  are  within  some  distance.  This  is  a  generalization  of  the  relational  join  operations. 

This  manual  documents  the  ATBE  system  and  illustrates  its  use.    Several  examples  taken  directly  from 
the  complete  implementation  are  discussed  in  detail. 


Chapter  1 
Introduction 

1.1  Overview 

Approximate-Tree- By- Example  ( ATBE)  is  a  system  designed  to  support  constructing,  comparing  and  querying 
sets  of  trees.  It  contains  algorithms  for  tree  pattern  matching  and  a  query  language  to  invoke  those  algorithms. 

Trees  considered  here  are  ordered  labeled  ones;  that  is,  their  nodes  are  labeled  and  the  left-to-right  order 
among  siblings  is  significant.  Such  trees  have  many  applications  in  vision,  molecular  biology,  programming 
compilation  and  natural  language  processing,  including  the  representation  of  images  [18],  patterns  [5],  [7], 
[13],  intermediate  code  [3],  [9],  grammar  parses  [6],  [8],  [17],  [23],  dictionary  definitions  [2],  [14],  and  secondary 
structures  of  RNA  [19].  They  are  frequently  used  in  other  disciplines  as  well. 

At  present,  ATBE  is  a  single  user  system.  The  system  is  implemented  in  C  and  X-windows  [28].  The 
system  is  currently  installed  on  SUN  3/50  and  SPARC  1  workstations,  all  running  BSD  UNIX  4.2.  This 
manual  documents  the  ATBE  system  and  illustrates  its  use. 

1.2  Preliminaries 

1.2.1      Node  Formats 

We  first  discuss  how  to  specify  trees.  Each  node  in  a  tree  has  a  label.  In  addition,  each  node  may  be 
associated  with  additional  information  called  node  contents.  Examples  of  node  contents  are  size  properties 
for  RNA  structures  [19],  or  lexical  features  for  grammar  parses  [6]. 

We  provide  a  simple  C  language  to  specify  the  node  formats.  Table  1.1  gives  a  BNF-like  syntax  for  the 
language. 
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<  Format  > 

_ 

{<seqjiodeJonnat>} 

<  seq  Jiode  Jormat  > 

= 

<node_fonnat>  |  <nodeJonnat>  <seq_node Jormat > 

<seq_node  Jormat  > 

= 

{<seq_fields>}  |  {} 

<seq_fields> 

= 

<field>;  1  <field>;  <seqJields> 

<field> 

= 

<charJield>  |  <stringJield>  |  <intJield>  |  <float_field> 

<charJield> 

- 

char  <seq-var> 

<string-field> 

= 

string  <seq-var> 

<jnt-field> 

= 

int  <seq-var> 

<floatJield> 

= 

float  <seq-var> 

<seq.var> 

= 

<var>  1  <var>,  <seq.v£ir> 

<var> 

= 

<identifier  >  |  <identifier>[  ]  |  <identifier>[<positiveJnt>] 

<identifier> 

= 

<letter>[<letter>  |  <digit>]* 

<letter> 

= 

A|B|...|Z|a|b|...|z 

<positiveJnt> 

= 

<nonzerojiigit>  [<digit>]* 

<nonzero  jijgit  > 

= 

1  |213|4|5|6|7|8|9 

<digit> 

= 

0J112|3|4|5|6|7|8|9 

Table  1.1:  Grammar  for  the  language  specifying  node  formats  of  trees.  (The  expression  [a]*  means 
zero  or  more  occurrences  of  a.) 

Thus,  the  information  associated  with  a  node  can  be  grouped  into  the  form  of  a  tuple  of  fields  of  basic  data 
types,  i.e.,  integers,  reals,  characters,  strings.  Each  field  can  either  have  a  single  value  of  a  type,  a  fixed-sized 
array  of  that  type,  or  an  arbitrary-length  sequence  of  that  type. 

For  example,  consider  the  trees  in  Figure  1.1. 


T2 


Tz 


Figure  1.1:   Example  trees. 

Each  node  in  tree  T\  lias  a  label  only.  Each  node  in  tree  T^  lias  a  label  and  size.  Each  node  in  tree  Ta  has 
a  label  and  a  sequence  of  sizes    These  node  formats  could  be  defined  as  follows: 
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{ 
Node-Ll 


NodeX2 

{ 

int  size; 

} 
Node-L3 

{ 

int  csizeC  ] ; 

} 
} 

Figure  1.2:  Example  of  node  format  definitions. 

Node_Ll  (for  Ti),  Node_L2  (for  Tt)  and  Node_L3  (for  T3)  are  node  formats;  size  is  the  name  of  the  integer 
and  csize  represents  the  name  of  a  sequence  of  integers.  The  label  itself  (present  in  all  nodes)  is  a  sequence 
of  characters. 

In  customizing  ATBE,  users  need  to  notify  the  system  of  node  formats  for  trees  of  interest.  In  Section 
3.2.1,  we  shall  show  how  to  input  this  information  to  the  system. 

1.2.2      Editing  Distance  Between  Trees 

We  use  the  editing  distance  to  measure  the  difference  between  two  trees.  Informally,  the  distance  between 
trees  Ti  and  To  is  the  cost  of  the  cheapest  transformation  taking  Ti  to  Tn  (or  vice  versa).  There  are  three 
types  of  edit  operations  a  transformation  may  include:  relabel,  delete  and  insert.  Each  edit  operation  has  a 
(user-defined)  cost. 

Relabeling  modifies  a  node's  label  in  a  tree.  Delete  takes  a  node  out  of  the  tree,  making  its  children  become 
its  parent's  children.  Insert  is  the  inverse  of  delete.  Inserting  node  6  as  the  child  of  node  a  causes  a  consecutive 
sequence  of  the  children  of  a  to  become  the  children  of  6.  Figure  1.3  illustrates  the  edit  operations. 

The  edit  operations  give  rise  to  a  mapping  which  is  a  graphical  specification  of  what  edit  operations  apply 
to  each  node  in  the  two  trees.  For  example,  the  mapping  in  Figure  1.4  shows  a  way  to  transform  T  to  T' .  The 
transformation  includes  deleting  node  labeled  6  in  T  and  inserting  node  labeled  /  in  T' . 
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Relabeling 


(i) 


(iii) 

Figure  1.3:  (i)  Relabeling:  To  change  one  node  label  (a)  to  another  (6).  (ii)  Delete:  To  delete  a 
node.  (All  children  of  the  deleted  node  b  become  children  of  the  parent  a.)  (iii)  Insert:  To  insert  a 
node.  (A  consecutive  sequence  of  siblings  among  the  children  of  a  become  the  children  of  b.) 


Figure  1.4:  A  mapping  from  T  to  T' .  A  dotted  line  from  a  node  u  in  T  to  a  node  v  in  T'  indicates 
that  u  should  be  changed  to  i'  if  u  ^  r,  or  that  u  remains  unchanged  if  u  =  v.  The  nodes  of  T  not 
touched  by  a  doited  line  are  to  be  deleted  and  the  nodes  of  T'  not  touched  are  to  be  inserted.  The 
mapping  shows  a  way  to  transform  7"  to  T' . 

1.2.3      Approximate  Tree  Matching  Operations 

We  consider  two  types  of  approximate  tree  matching  operations: 

•  Cutting  at  node  m  means  removmg  the  subtree  rooted  at  m. 

•  Pruning  at  node  m  means  removing  all  the  descendants  of  m.    (Thus,  a  pruning  never  eliminates  the 
entire  tree.) 
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The  operations  are  useful  in  locating  portions  of  a  tree  that  closely  match  a  given  pattern.  Consider,  for 
example,  the  trees  in  Figure  1.5.  Ti  exactly  matches  the  subtree  rooted  at  a  in  T3  if  we  prune  at  node  6  of  Ta 
(or  cut  at  node  e).  Intuitively,  cutting  is  "looser"  than  pruning  in  that  it  allows  arbitrary  hra.nches  of  trees  to 
be  removed.  For  example,  by  cutting  at  node  c  and  node  e  in  T3,  the  resulting  tree  matches  T2.  However,  no 
pruning  operation  can  be  applied  in  this  case  to  yield  such  a  matching. 


Figure  1.5:  Example  trees. 

1.3      Organization  of  the  Manual 

Apart  from  this  introductory  chapter,  the  manual  contains  four  chapters.  Chapter  2,  "ATBE  Queries", 
illustrates  ATBE  queries  and  operations,  and  describes  their  syntax  and  semantics. 

Chapter  3,  "ATBE  Interface  Commands",  describes  ATBE  on-screen  and  pull-down  menu  commands. 
These  commands  help  users  customize  the  system,  construct  a  query,  format  the  output  of  the  query,  etc. 

Chapter  4,  "Installing  and  Constructing  a  Custom  Tool",  explains  how  to  install  the  ATBE  system  into  a 
BSD  UNIX  system,  and  how  to  customize  the  system. 

Chapter  5,  "Example  of  an  ATBE  Session",  presents  an  annotated  sample  ATBE  session. 

Remark:  In  addition  to  this  manual,  it  is  helpful  to  refer  to  the  following  papers  when  using  the  ATBE 
system: 

•  Simple  Fast  Algorithms  for  the  Editing  Distance  between  Trees  and  Related  Problems  [27]:  This  pa- 
per defines  the  editing  distance  and  the  mapping  between  two  trees.  The  paper  presents  a  dynamic 
programming  algorithm  for  comparing  trees. 

•  ATBE:  A  System  for  Approximate  Tree  Matching  [26];  This  paper  describes  ATBE  architecture  and 
its  implementation.  Some  ATBE  applications  and  underlying  query  optimization  algorithms  are  also 
discussed. 

•  Query  Processing  for  Distance  Metrics  [25]:  This  paper  focuses  on  the  algorithms  for  optimizing  ATBE 
queries. 
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ATBE  Queries 


In  ATBE,  the  user  formulates  a  query  by  building  a  pattern  tree  on  the  screen,  and  providing  an  appropriate 
statement. 

2.1  Building  A  Tree 

In  building  a  tree,  the  user  may  dr-aw  it  from  scratch,  may  edit  an  existing  tree  in  the  database  (e.g.,  the 
existing  tree  may  be  a  template),  may  edit  a  solution  tree  of  another  query,  or  may  key  in  the  tree  in  its  linear 
form  directly.  The  linear  form  of  the  tree  is  a  fully  parenthesized  expression  which  is  a  preorder  enumeration 
of  the  tree  (i.e.,  first  the  root  then  the  subtrees,  from  left  to  right).  Node  contents  (if  any)  are  enclosed  in 
braces  and  immediately  follow  their  corresponding  node  labels.^  We  shall  discuss  how  to  build  a  pattern  tree 
in  detail  in  Chapter  3. 

2.2  Query  Statements 

A  query  statement  can  be  of  type  retrieve,  insert  or  delete.  The  first  one  is  for  information  retrieval  and 
extraction.  The  second  and  third  are  used  for  modifying  the  underlying  database. 

Figure  2.1  illustrates  an  ATBE  query.  Node  contents  for  the  pattern  are  not  shown  on  the  screen  (for 
saving  space  purposes),  and  can  be  seen  through  pop-up  windows  (e.g.,  the  pop-up  window  associated  with 
the  node  labeled  N  indicates  that  the  node's  size  is  2).  The  query  statement  is  entered  in  the  Statement 
window.  Also  shown  in  the  figure  is  the  linear  form  of  the  pattern,  which  was  keyed  in  using  the  text  editor. 

The  query-by-example  paradigm  employed  by  ATBE  allows  rapid  and  incremental  development  of  queries, 
which  can  be  easily  refined  to  highlight  certain  structural  properties  of  trees  under  investigation.  Many  systems 
have  used  similar  concepts  in  constructing  queries  [4],  [10],  [12],  [15],  [24],  [29],  [30].  The  difference  is  that, 
whereas  most  of  these  systems  express  operations  in  tabular  skeletons,  ATBE  expresses  operations  in  tree 
structures,  which  represent  the  entries  in  the  underlying  database. 


'  For  example,  the  linear  forms  of  the  trees  in  Figure  1.1  arc  (a   (b)    (c)),{d   {size  1}    (e   {size  2})    (f   {size  3})),and(g 
{csize   1    2}    (h    {csize  3  4})    (i    {csize  5   6  7) )).  respectively. 
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Figure  2.1:  A  query  and  the  screen  layout  for  ATBE;  node  contents  (e.g.,  size  properties)  are 
displayed  via  pop-up  windows;  the  string  shown  in  the  key-in  window  is  the  linear  form  of  the 
pattern. 
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2.3      Query  Description  and  Interpretation 

Table  2.1  gives  a  complete  BNF-like  syntax  of  ATBE  query  statements. 


<ATBE  statement> 

= 

retrieve  <tree-type>  <tree-variable>  from  file  <file-name> 
[  into  file  <file-name>  ]  [where  <boolean-expr>  ] 
1  insert  <tree-name>  into  file  <file-nanie> 
1  delete  <tree-name>  from  file  <file-n2ime> 

<tree-type> 

= 

tree  |  subtree 

<  tree-  variable> 

= 

<lowercase-identifier> 

<  file- name  > 

= 

<uppercase-identifier> 

<tree-naine> 

= 

<lowercase-identifier> 

<boolean-expr> 

<boole£in-expr>  and  <booleem-expr> 

1  <boolean-expr>  or  <boolean-expr> 

1  (  <booleein-expr>  ) 

1  <distance-op>  (  <pattem-tree>,  <tree-variable>  )  <arithcmp>  <expr> 

1  <tree-op>  (  <tree-variable>  )  <arithcmp>  <constant> 

<pattem-tree> 

= 

pa 

<distance-op> 

= 

dist  1  distwithcut  |  distwithprune 

<tree-op> 

= 

size  1  height 

<arithcmp> 

= 

>  l>   1  =  19^  l<l< 

<expr> 

= 

<constant>  |  <aggregate-expr> 

< constant > 

= 

[<  digit  >]+ 

<aggregate-expr> 

= 

<aggregate-op>  (  <distance-op>  (  <pattem-tree>,  <iteration-variable>) 
where  <iteration-variable>  is  <tree-type>  of  file  <file-name>  ) 

<aggregate-op> 

= 

min  1  max 

<  iteration- variable> 

= 

<lowercase-identifier> 

<uppercase-identifier> 

= 

<uppercase-letter>[<upperccise-letter>|<digit>]* 

<  lowercase-identifier> 

= 

<lowercase-letter>[<lowercase-letter>|<digit>]* 

<  uppercase-let  ter> 

= 

A|B|...|Z 

<  lowercase- letter> 

= 

a|b|...U 

«ligit> 

= 

0|1|2|3|4|5|6|7|8|9 

Table  2.1:  Syntax  for  the  ATBE  query  statements.   (The  expressions  [a],  [a]"*",  [a]*  mean  zero  or 
one  occurrence  of  a,  one  or  more  occurrences  of  a,  and  zero  or  more  occurrences  of  a,  respectively.) 

ATBE's  retrieve  statement  has  the  following  form 

retrieve  <tree-type>  <tree-variable>  from  file  <file-name>  where  <boolean-expression> 

The  tree-type  is  one  of  two  words:  tree  or  subtree.  The  from  clause  specifies  the  file  to  search.  The  where 
clause  imposes  constraints  on  trees,  specifying  conditions  a  solution  (sub)tree  must  satisfy.  The  query  is 
implemented  by  a  search  through  the  specified  file  in  which  each  data  (sub)tree  belonging  to  the  file  is  selected 
and  stored  into  the  tree-variable.  Each  time  a  new  (sub)tree  is  stored  in  the  variable,  the  boolean  expression 
is  evaluated;  if  the  expression  is  true,  the  (sub)tree  becomes  part  of  the  answer." 

A  boolean  expression  consists  of  terms  connected  with  the  logical  connectivities  and  (for  intersection),  or 
(for  union).  Let  pa  refer  to  the  pattern  and  t  a  (sub)tree  in  the  file.  A  term  has  the  form 

<tree-op>  {  t  )  6  <constant> 


^When  the  boolean  expression  is  absent,  the  tree-variable  is  interpreted  &s  a  tree  name.  For  example,  the  statement  retrieve 
tree   tl    from  file   F  retrieves  the  specific  tree  with  name  tl  from  file  F. 
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<distance-op>  (  pa,  t  )  0  <expression> 

where  ^  is  a  comparison  operator  (e.g.,  >,  >,  =,  ^,  <,  <),  and  expression  evaluates  to  a  constant. 
There  are  two  kinds  of  tree  operators: 

•  size:  computes  the  total  number  of  nodes  in  the  (sub)tree  /; 

•  height:  computes  the  number  of  edges  in  a  longest  path  from  the  root  to  a  leaf  of  t. 
There  are  three  kinds  of  distance  operators: 

•  dist:  computes  the  distance  between  pa  and  t\ 

•  distwithcut:  computes  the  distance  between  pa  and  t,  allowing  zero  or  more  cuttings  at  nodes  from  t 
(cf.  Section  1.2.3); 

•  distwithprune:  computes  the  distance  between  pa  and  t,  allowing  zero  or  more  prunings  at  nodes  from 
t  (cf.  Section  1.2.3). 

The  expression  can  be  a  constant,  or  an  aggregate  expression;  the  latter  has  the  form 

<aggregate-op>  (  <distance-op>  (  <pattern-tree>,  <iteration-variable>  ) 
where  <iteration-variable>  is  <tree-type>  of  file  <file-name>  ) 

The  aggregate  operator  can  be  either  niin  or  max.  The  expression  is  evaluated  by  binding  each  (sub)tree 
in  the  specified  file  to  the  iteration-variable,  and  then  computing  the  distance  between  the  pattern,  which  by 
convention  is  identified  by  pa,  and  the  (sub)tree  (with  or  without  cuttings  or  prunings).  The  minimum  (or 
maximum)  of  these  distance  values  is  then  returned  as  the  result. 

Example:  Here  are  some  examples  of  boolean  expressions  (we  denote  the  pattern  as  pa  and  data  (sub)tree 
as  /): 

dist (pa ,    0=4  returns  tnie  if  the  distance  betseen  pa  and  t   is 

4;   returns  false  otherwise; 

dist  (pa,    0    <    12  and  returns  true   if   the  distcince  between  pa  and  t   is 

height(()   =   5;  less  than   or  equal   to    12  cUid  the  height  of    t 

is   5;    returns  false   otherwise; 

dist  (pa ,   0   =  min   (  dist    (pa,    u) returns  true  if   the  distance  between  pa   and  t 

where  u  is  tree  of   file  F)  ;  is  equal  to  the  distcuice  between  pa  amd 

its   closest  tree   in  file  F;    returns  false  otherwise; 
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Example:  Here  are  some  examples  of  retrieve  statements: 

retrieve  tree  t   from  file  E  where  distwithpruneCpa,   t)    =  3; 
retrieve  subtree  t   from  file  F  nhere  distCpa,  t)    <   10; 
retrieve  tree  t  from  file  G 

where  distCpa,  t)   =  mai   (distCpa,  u)   where  u  is  tree  of  file  G) ; 

2.3.1      Semantics  of  the  Distance  Operators 

In  addition  to  having  constant  nodes,  namely,  ones  whose  labels  and  contents  are  specified,  a  pattern  may 
contain  the  following  marks: 

•  stars  (  *  ); 

•  question  marks  (  ?  ); 

•  variables  (_x,  .y,  etc.). 

Both  the  stars  and  question  marks  appear  on  edges.  The  variables  appear  on  leaves  and  are  preceded  with 
an  underscore.  These  marks  may  appear  in  several  places  in  a  pattern. 

Let  -Xi,-2;2i  •  •  --Xn  be  variables  in  the  pattern  pa.  The  variables  may  be  matched  with  (instantiated  into) 
arbitrary  subtrees  in  the  data  tree  t.^  (Repeated  variables  (i.e.,  different  occurrences  of  the  same  variable)  are 
matched  with  equivalent  subtrees.)  When  computing  the  distance  between  pa  and  t  (with  or  without  cuttings 
or  prunings),  the  variables  will  be  matched  so  as  to  minimize  the  distance. 

The  semantics  for  stars  and  question  marks  is  more  complicated  than  that  for  variables.  Suppose  a  star 
appears  on  some  edge  in  pa.  The  star  can  be  viewed  as  a  pseudo  node  in  pa,  which  will  be  mapped  to  (in- 
stantiated into)  zero  or  more  nodes  Vi,V2,  ■  ■ .,  t'm  on  a  path  P  in  the  data  tree  t,  i.e.,  v,  is  the  parent  of  t'i+i, 
1  <  i  <  m  —  I.  Like  variable  instantiation,  the  star  will  also  be  matched  in  such  a  way  as  to  minimize  the 
distance  between  pa  and  t.  The  instantiation  of  question  marks  is  very  much  like  that  of  stars,  except  that  the 
instantiation  will  include  certain  descendants  of  nodes  on  P.  Specifically,  let  t/^ ,  1  <  ;'  <  m,  1  <  j  <  t,,  be  the 
children  of  node  ti,  that  are  not  on  P.  The  instantiation  will  include  (1)  nodes  t'i,t'2,  ■  ■  -.I'm.  (2)  all  subtrees 
rooted  at  nodes  t^^ ,  1  <  2  <  rrj  —  1,  1  <  j  <  ki,  and  (3)  subtrees  rooted  at  nodes  f^  where  j  belongs  to  the 
range  [l,/i]  U  [h,  km],  and  /j  <  /;-.  Again,  the  instantiation  will  be  chosen  so  as  to  minimize  the  distance. 
Figure  2.2  illustrates  the  instantiation  of  these  marks. 


^The  subtrees  could  be  empty. 
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(i) 


(ii) 


(in) 


Figure  2.2:  (i)  Instantiating  variables:  the  shaded  subtrees  in  (  are  matched  with  the  variables 
in  pa.  (li)  Instantiating  the  star:  the  nodes  (black  dots)  on  the  path  P  are  matched  with  the  star. 
(iii)  Instantiating  the  question  mark:  the  nodes  (black  dots)  on  the  path  P  and  the  shaded  subtrees 
are  matched  with  the  question  mark.  Notice  that  some  (consecutive)  children  along  with  their 
descendents  (represented  by  the  unshaded  subtree  in  /)  of  the  ending  node  of  P  are  excluded  from 
the  instantiation;  they  will  be  mapped  to  the  nodes  underneath  the  question  mark  in  pa. 
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Example:  Consider  the  following  pattern  (pa)  and  data  tree  (t): 


b  e  f 


c  d  g  h 


The  variable  ^  will  be  mapped  to  the  subtree  rooted  at  6  (at  zero  cost)  and  .y  is  mapped  to  the  subtree 
rooted  at  /;  the  resulting  distance  between  pa  and  /  is  1. 

Example:  Consider  the  following  pattern  (pa)  and  data  tree  (t): 


dee 


The  mark  *  will  be  mapped  to  node  d  (at  zero  cost)  and  the  other  nodes  in  pa  are  mapped  to  nodes  having 
the  same  label  in  t;  the  resulting  distance  between  pa  and  t  is  2. 

Example:  Consider  the  following  pattern  (pa)  and  data  tree  (t): 


The  mark  "^  will  be  mapped  to  nodes  c,  d  and  e,  and  the  other  nodes  in  pa  are  mapped  to  nodes  having 
the  same  label  in  t;  the  resulting  distance  between  pa  and  {  is  0. 
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2.4     Application  to  Information  Retrieval 

One  of  the  major  functions  of  ATBE  is  to  support  (tree)  information  retrieval.  A  most  commonly  used  re- 
trieval operation  in  applications  is  perhaps  to  find  trees  closest  to  a  given  pattern.''  This  type  of  retrieval 
might  be  expressed  in  ATBE  queries  as  follows: 


N 

1 

1 

M 

/    \ 

B                H 

A 

/  \ 

H       H 

retrieve  tree  t  from  file  F  where  dist(pa,  t)  = 

min  (dist(pa,  u)  where  u  is  tree  of  file  F) 

Figure  2.3:  ATBE  query  for  finding  trees  in  file  F  that  are  closest  to  the  pattern. 

Trees  obtained  from  the  query  are  displayed  one  at  a  time  on  the  screen.^  The  user  is  able  to  see  the  best 
mapping  that  yields  the  distance.  When  a  solution  tree  is  large  (e.g.,  contains  hundreds  of  nodes),  its  edges 
and  nodes  are  shrunk  proportionally,  so  that  the  entire  tree  can  fit  in  the  screen;  users  may  then  lasso  part  of 
interest  and  zoom  in  to  see  more  detail  (see  Section  3.5). 


Figure  2.4:   Horizontal  normal  form  for  the  tree  in  Figure  2.3. 

In  situations  in  which  trees  represent  noisy  information,  users  might  wish  to  find  data  trees  that  are  within 
certain  distance  of  the  pattern  (this  type  of  retrieval  is  known  as  the  good-match  retrieval  [25]).  For  example, 
assuming  the  pattern  is  cls  shown  in  Figure  2.3,  the  query 


*  For  example,  in  aneilyzing  features  of  a  newly  sequenced  RNA,  there  may  not  exist  RN  As  in  the  database  that  exactly  matdi 
the  new  RNA.  Under  this  circumstance,  researdiers  often  attempt  to  get  those  that  are  most  similar  to  the  new  one.  This  type 
of  query  is  also  known  as  the  besl-match  retrieval  [20],  [21]. 

^They  are  displayed  either  in  vertical  norma/  /orm  (as  shown  in  Figure  2.3)  or  in  horizontal  normal  jorm  (see  Figure  2.4). 
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retrieve  tree  t  from  file  F 

where  dist(pa,  t)  <  7  and  height(t)  >  5 

finds  data  trees  that  are  within  distance  7  of  the  pattern  and  whose  height  is  greater  than  5. 

It  is  possible  that  some  portion  of  a  pattern  is  unimportant.  In  such  a  situation,  the  user  may  provide  a 
pattern  containing  variables  and  question  marks,  as  shown  in  Figure  2.5. 

This  query  illustrates  one  use  of  variables  and  question  marks.  The  marks  represent  unimportant  parts  in 
the  pattern. 

In  some  applications,  users  may  wish  to  retrieve  portions  of  trees,  rather  than  entire  trees.  Figure  2.6 
shows  an  ATBE  query  that  finds  subtrees  t  in  file  F  that  exactly  match  a  given  pattern,  allowing  zero  or  more 
prunings  at  nodes  from  t. 

Solution  subtrees  obtained  from  the  query  are  displayed  one  at  a  time;  they  may  also  be  displayed  on  a 
tree  basis,  namely,  the  entire  tree  is  displayed  with  corresponding  subtrees  highlighted.  Both  the  mapping 
and  pruning  information  concerning  solution  subtrees  are  available  and  can  be  seen  on  the  screen  (see  Section 
3.5). 


retrieve  tree  t  from  file  F 
where  dist(pa,  t)  <  2 


Figure  2.5:  ATBE  query  in  which  some  part  of  the  pattern  is  unimportant. 


retrieve  subtree  t  from  file  F 

where  dislwithpruno(pa,  t)  =  0 


Figure  2.6:  ATBE  query  for  retrieving  subtrees. 

Like  most  other  query  languages  [1],  [22],  ATBE  also  allows  users  to  store  solution  (sub)trees  in  a  file, 
rather  than  only  display  them  on  the  screen.  For  example. 
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retrieve  tree  t  from  file  F  into  file  G 

where  dist(pa,  t)  =  majc  (  dist(pa,  v)  where  v  is  tree  of  file  F) 

stores  trees  of  F  that  are  most  dissimilar  to  (worst  matching)  the  pattern  in  file  G. 

2.5      Application  to  Information  Extraction 

The  previous  section  presents  several  examples  for  information  retrieval.  Another  major  function  of  ATBE  is 
to  support  information  extraction  from  trees.  Let  us  consider  some  examples  drawn  from  natural  language 
processing. 

Consider  the  following  grammar  parse,  which  represents  the  sentence  "The  boy  reads  the  book"  [6]. 


det  n  V  np 

I  1  I  /"\ 

The  boy    reads  det  n 

I  I 

the         book 

Figure  2.7:  Parse  tree  representing  the  sentence  "The  boy  reads  the  book"  [6]. 

Suppose  the  user  wishes  to  find  all  the  nouns  that  can  be  the  direct  object  of  the  verb  "read"  from  a 
database  of  parsed  text  [6].  He  would  type  the  query  as  shown  in  Figure  2.8.  This  query  retrieves  data  trees 
that  exactly  match  the  pattern,  allowing  zero  or  more  cuttings  at  nodes  from  the  data  trees.  The  cut  subtrees 
represent  don't-care  parts,  i.e.,  they  specify  that  the  given  pattern  should  match  a  data  tree  even  if  that  data 
tree  has  some  additional  branches,  which  are  thus  considered  irrelevant  with  respect  to  this  pattern.  The 
query  illustrates  another  use  of  variables.  Nodes  used  to  instantiate  the  variable  _x  represent  objects  of  the 
verb  "read";  they  can  be  highlighted  on  the  screen  (Figure  2.9). 


^\ 

vp 

1 

np 

1 

read 

n 

1 

_x 

retrieve  tree  t  from  file  F 

where  distwithcut(pa.  t 

=  0 

Figure  2.8:  ATBE  query  for  finding  nouns  that  are  the  direct  object  of  "read" 
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Figure  2.9:  A  screen  layout. 

At  limes,  computational  linguists  might  want  to  find  the  semantic  properties  of  a  noun,  particularly  as 
determined  by  the  predicates  for  which  it  may  serve  as  an  argument  [16].  Consider,  for  example,  the  query 
"what  can  be  done  to  a  book?"  Here  the  user  wishes  to  get  the  set  of  verbs  for  which  "book"  is  the  object 
from  the  database.  In  ATBE,  this  query  can  be  expressed  as  shown  in  Figure  2.10. 


reincvf  tree  t  from  file  F 

where  distwithcul(pa,  t)  =  0 


Figure  2.10:  ATBE  queries  (or  finding  verbs  for  which  "book"  is  the  object. 
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This  query  illustrates  the  use  of  stars.  The  star  specifies  that  a  path  may  contain  at  certain  point  ^ero  or 
more  unspecified,  intermediate  nodes.  This  operator  is  useful  in  locating  verbs  in  more  complicated  sentences 
such  as  "The  boy  wanted  to  read  the  book",  or  "The  woman  knew  that  the  boy  wanted  to  read  the  book"  [6]. 

Thus,  by  attaching  variables  and  stars  in  the  pattern,  users  can  extract  information  (nodes,  subtrees)  from 
the  database.^ 


Figure  2.11:   Replacing  dislwithcut(pa,  t)  =  0  by  distwithcut(pa,  t)  <  1  in  the  query  statement 
in  Figure  2.8  enables  users  to  locate  nouns  in  sentences  with  wrong  or  different  tense. 

2.6      Updating  Operations 

Having  described  ATBE's  relneval/exiraction  operations,  we  now  turn  to  its  updating  operations.  ATBE 
provides  insert  and  delete  operations  to  maintain  a  database.  For  example,  if  the  user  wishes  to  erase  tht 
pattern  (with  name,  say,  tl)  shown  in  Figure  2.1  from  file  F,  he  would  type  the  statement 

delete  tl  from  file  F 
Modifying  a  tree  can  be  achieved  by  first  retrieving  it  from  the  file,  e.g., 


^\Ve  note  that  approximate  tree  matching  can  be  very  useful  in  these  appUcations.  For  example,  users  may  type  in  verbs  of 
vrong  tense  (e.g..  use  "has  read"  rather  than  "read"  in  a  present  sentence)  when  inputing  trees.  Using  distwithcut(pa.  t)  <  k 
ather  than  djstwithcutfpa.  t)  =  0  may  help  to  get  more  complete  solutions  (see  Figxire  2.11). 
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retrieve  tree  foo  from  file  F 
The  tree  foo  now  appears  on  the  screen.  The  user  may  then  edit  it  and  store  the  new  foo  back  by  typing 

insert  foo  into  file  F 

2.7      Lack  of  Orthogonality  in  Current  Implementation 

Currently,  queries  that  do  not  contain  marks  (i.e.,  variables,  stars,  or  question  marks)  are  all  implemented. 
Some  queries  that  involve  distance  operators  and  marks  are  not  yet  supported  (see  the  table  below). 


dist 

distwithcut 

distwithprune 

variables 

stars 

question  marks 

t 

t 
t 

Remark: 

y/:  The  operators  have  been  implemented  in  ATBE  Alpha  version  1.0. 

f:  The  operators  have  not  yet  been  implemented  in  Alpha  version  1.0;  we  are  now  developing  algorithms  for 

them  and  will  implement  them  in  the  next  version.' 

Jit:  We  have  no  plans  to  implement  these  operators,  because  we  have  not  found  significant  applications  for  them. 


'Current  implementation  cannot  handle  repeated  variables.  They  will  appear  in  the  next  version. 


Chapter  3 

ATBE  Interface  Commands 


ATBE  provides  a  set  of  commands  that  help  users  customize  the  system,  formulate  a  query,  and  examine 
solution  trees  as  well  as  the  mapping  information.    Often,  executing  these  commands  may  require  users  to 
click  the  mouse.    In  ATBE,  only  the  right  mouse  button  has  effects.    For  simplicity,  we  shall  use  the  term 
"clicking  the  mouse"  rather  than  "clicking  the  right  mouse  button"  in  the  manual. 
A  typical  session  between  the  user  and  system  comprises  the  following  steps: 

1.  Build  a  pattern  tree. 

2.  Enter  a  statement  in  the  statement  window. 

3.  Examine  the  output  (solution  trees)  of  the  query. 

4.  If  necessary,  edit  a  solution  tree,  make  it  a  new  pattern  and  then  go  to  Step  2. 

There  are  four  kinds  of  windows  the  user  can  see  in  a  session:  canvas  windows,  edit  windows,  dialog 
windows,  and  message  windows. 

The  pattern,  solution  (sub)trees  as  well  as  the  mapping  information  are  displayed  on  the  canvas  window. 
Node  contents  (if  any)  associated  with  a  node  are  not  shown  on  the  canvas  window  and  can  be  seen  via  a 
pop-up  window  (cf.  Figure  2.1).^ 

Statements  are  entered  in  pop-up  edit  windows  (see  Section  3.4).  In  an  edit  window,  the  user  can  type  in 
texts  using  a  screen  editor. - 

The  dialog  window  is  poped  up  when  the  system  requests  data  from  the  user.  All  the  dialog  windows  have 
two  subwindows,  labeled  OK  and  Cancel.  After  the  user  types  in  the  data,  he  either  clicks  OK,"^  which  signals 
to  the  system  that  the  information  is  good,  or  clicks  Cancel,  which  signals  to  the  system  that  he  wants  to 
cancel  the  information  he  just  input. ''  The  dialog  window  disappears  after  the  user  clicks  either  of  the  two 
subwindows. 


The  user  can  see  the  pop-up  window  by  moving  the  cursor  to  the  desired  node,  ^lnd  then  clicking  the  mouse.    Clicking  the 
mouse  again  erases  the  pop-up  window. 

^Current  ATBE  implementation  restricts  the  text  editor  to  vi  only.   We  are  improving  the  system  so  that  the  user  can  use 
whatever  he  chooses  as  his  default  editor. 

^By  clicking  a  window,  we  mean  first  moving  the  cursor  to  the  window  and  then  clicking  the  mouse. 
[ll]  gives  a  good  introduction  to  how  to  input  information  via  X  dialog  windows. 
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The  system  displays  error  messages  or  guide  information  via  the  message  window.  The  user  can  erase  the 
message  window  by  clicking  the  mouse  once. 


3.1      Command  Summary 


ATBE  commands  can  be  categorized  into  six  classes. 


Class  1  (Customize) 

nodedef,  codegen,  cost-function,  setuptree,  listjiodeformats 
nodeformat,  unit.cost,  constant.cost,  user.defined-cost 

Class  2  (Create) 

scratch,  keyJn 

Class  3  (Statement) 

retrieve.tree,  retrieve-subtree,  delete-tree,  insert-tree 
previous-Statement,  unix-shell,  list 

Class  4  (Display) 

name,  distance,  tree/subtree,  next-match,  mark,  focus 
unfocus-all,  map-withcut,  map-withprune  map-only,  instantiate 
erase-mapinfo,  form-toggle,  print 

Class  5  (Edit) 

add-as-a-parent ,  add.as-a-child,  attach-a.star,  detach-a-Star 
attach-a-questionmark,  detach-a-questionmark,  delete-a-node 
cut-a_node,  prune-a_node,  modify _a-node,  copy-a-node 
copy-ajsubtree,  undoJast,  undo-all,  paste 

Class  6  (Exit) 

exit 

Clciss  titles  (i.e..  Customize,  Create,  Statement,  Display,  Edit,  and  Exit)  are  the  titles  of  the  pull-down 
menus  (cf.  Figure  2.1).  The  commands  in  Class  1  are  used  to  set  up  node  formats  and  establish  edit  costs. 
Class  2  contains  commands  to  create  a  new  pattern  tree.  The  commands  in  Class  3  enable  the  user  to  input 
queries  to  the  system.  The  commands  in  Class  4  are  used  to  display  solution  (sub)trees.  Class  5  consists  of 
commands  for  modifying  a  tree.  The  exit  command  in  Class  6  enables  the  user  to  exit  the  ATBE  system.^ 

To  illustrate  the  use  of  these  commands,  we  assume  in  the  remaining  of  this  manual  that  trees  of  interest 
have  formats  Node JLl,  Node-L2  and  NodeX3,  as  defined  in  Section  1.2.1.  Further,  we  assume  that  the  following 
mini-database  of  trees  exists:^ 

•  File  E:  This  file  contains  three  trees  (shown  in  Figure  3.1).  They  are  stored  as  follows: 

-  tl    (a    (b)    (c    (h)    (i))    (e)); 

-  t2    (f    (g)    (k)    (m   (h)    (i)    (n))); 

-  t3    (u   (v)    (w)): 

•  File  F:  This  file  contains  two  trees  (shown  in  Figure  3.2).  They  are  stored  as  follows: 

-  t4   (a   {size   1}    (p   {size  2})    (q   {size   3})    (r   {size  4})); 

-  t5    (j    {size  5}    (u   {size  6})    (v   {size   7}    (w   {size  8}))); 

•  File  G:  This  file  contains  three  trees  (shown  in  Figure  3.3).  They  are  stored  as  follows: 

-  t6    (a   {csize  5}    (c   {csize  3}    (e   {csize   1}))    (d  {csize  6   8})); 


^Throughout,  we  shall  use  the  term  "executing  a  command"  and  "clicking  a  command"  interchangeably.  To  execute  a 
command,  the  user  first  moves  the  mouse  to  the  corresponding  class  title  in  the  on-screen  menu  and  clicks  the  mouse.  A  pull- 
down menu  will  appear.  The  user  then  moves  the  cursor  (within  the  menu)  to  the  desired  command;  that  command  becomes 
dark.  Then,  the  user  clicks  the  mouse  again.  The  pull-down  menu  disappears  and  the  system  stcu-ts  to  execute  the  command. 

''Eacli  tree  in  a  file  is  stored  in  the  following  format:  treejiame  tree^om.  treejiame  is  the  name  of  tree,  whicli  is  the  key 
for  a  tree  and  should  be  unique  in  the  entire  database,  tree-fonn  is  the  hnear  form  of  the  tree  (cf.  Section  2.1). 
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t7  (b  {csize  4}  (a  {csize  2})  (a  {csize  3  5  9})); 
t8  (h  {csize  1  2}  (e  {csize  1  2})  (m  {csize  1  3})); 


/ 


Figure  3.1:  Trees  in  file  E. 


Figure  3.2:  Trees  in  file  F. 


6 
csize  4 


csize  3  csize  6  i 


csize  1  2  csize  1  3 


Figure  3.3:  Trees  in  file  G. 

Thus,  the  trees  in  file  E  have  formal  Node_Ll,  the  trees  in  file  F  have  format  Node-L2,  and  the  trees  in  file 
G  have  format  Node_L3,  respectively. 

3.2      Class  1  (Customize) 
3.2.1      Nodedef 

The  nodedef  command  creates  a  file  that  contains  all  node  definitions  (formats)  for  trees  of  interest.  After  the 
user  clicks  this  command,  an  edit  window  as  shown  below  pops  up: 
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A  sentence  starting  with is  a  comment. 

Here  is  an  example  with  two  node  formats. 

The  first  is  a  node  format  that  does  not  have  node  contents. 

The  second  is  a  node  format  having  node  contents. 

~{ 

Node  J.  1 

--{ 
--} 
Node-L2 

—  { 

int  size[  ]; 


—  } 

—  } 


Note  that  the  node  formats  should  be  included  in  parentheses. 
From  here  on,  you  can  enter  your  node  formats. 


The  user  then  keys  in  the  node  definitions  in  the  window.  (For  example,  for  our  sample  database,  the  user 
can  key  in  the  node  definitions  as  shown  in  Figure  1.2  in  the  window.)  Comments  are  ignored  by  the  parser. 
After  saving  this  file,  all  the  node  definitions  will  be  automatically  stored  in  the  file  named  nodedef . 

3.2.2      Codegen 

When  the  user  is  satisfied  with  node  formats,  he  clicks  the  codegen  command.   After  a  while,  the  following 
message  appears  on  the  screen: 


Codegen  is  done. 


This  signifies  that  a  collection  of  I/O  programs  (to  read  trees  having  the  user-defined  formats)  as  well  as 
cost  function  programs  specific  to  the  formats  have  been  generated. 

3.2.3      Cost_function 

Different  node  formats  may  be  associated  with  different  cost  functions.  Currently,  for  each  kind  of  node 
format,  ATBE  provides  users  with  two  default  cost  functions:  unit  and  constant.  Unit  assumes  that  all 
edit  operations  have  cost  one.  Constcint  assumes  that  insert  and  delete  have  a  constant  cost  and  relabel  has 
another  constant  cost.  These  constants  will  be  entered  directly  when  interacting  with  ATBE  (see  Section 
3.2.7).  In  addition,  for  each  kind  of  node  format,  ATBE  allows  users  to  define  their  own  cost  functions. 

If  the  user  wishes  to  define  his  own  cost  functions,  he  executes  the  command  cost-function.  After  clicking 
this  command,  a  pop-up  window  appears,  in  which  the  user  can  see  the  cost  function  programs  produced  by 
the  command  codegen.  For  each  node  format,  the  user  can  find  two  empty  procedure  frames:  userjdef  1  and 
user  jlef  2.  For  example,  an  empty  procedure  frame  for  the  node  format  Node  J.3  looks  as  follows: 
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int 

NodeJ.3_user.defl(NDl,  ND2) 

Node_L3  •NDl ,   *ND2; 

{ 

} 

Figure  3.4:  Example  of  an  empty  procedure  frame. 

Now,  suppose  that  in  some  application,  insert  and  delete  of  nodes  having  format  NodeJ,3  costs  5,^  and  a 
change  on  the  node  label  costs  7  when  the  labels  are  different.  The  procedure  for  defining  the  cost  is  as  follows: 

if  (NDl  ==  HULL) 

retumCS)  ; 
else  if  (ND2  ==  NULL) 

retum(5) ; 
else  if  (*(NDl->label)  ==  •(ND2->label)) 

retum(O)  ; 
else 

retum(7)  ; 

Figure  3.5:  Example  of  user-defined  cost  functions. 

The  user  can  replace  the  semicolon  enclosed  in  the  two  parentheses  in  the  procedure  frame  shown  in  Figure 
3.4  by  the  program  in  Figure  3.5.  After  saving  the  file,  the  pop-up  window  disappears. 

Remark: 

•  Currently,  the  user-defined  cost  functions  can  be  applied  to  mark-free  queries  only.  For  queries  containing 
marks  (i.e.,  variables,  stars,  question  marks),  only  the  default  costs  (i.e.,  unit  and  constant  costs)  are 
available. 

•  Users  may  want  to  experiment  with  different  cost  functions  as  applied  to  the  same  data.  Alpha  version 
1.0  allows  users  to  e.xperiment  with  two  different  cost  functions  (i.e.,  userjdef  1  and  userjdef2)  in 
different  runs    Of  course,  to  try  yet  other  cost  functions,  users  may  just  compile  those. 

3.2.4      Setuptree 

After  defining  the  cost  functions  (or  doing  nothmg  if  the  user  wishes  to  use  the  defaults  only),  the  user  clicks 
the  command  setuptree    After  a  while,  the  following  message  appears: 


'  The  unit  of  cost  is  often  application  dependent. 
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The  custom  ATBE  system  has  been  generated. 


This  signifies  that  an  object  file  called  atbe  has  been  produced,  which  becomes  the  user's  own  tree  tool. 

3.2.5  List_nodeforniats 

This  commands  lists  all  the  node  formats  the  custom  tool  can  handle  (i.e.,  those  that  the  user  inputs  when 
executing  the  command  nodedef  (cf.  Section  3.2.1)). 

3.2.6  Nodeformat 

While  using  the  generated  system,  the  user  may  input  queries  that  refer  to  trees  with  different  node  formats. 
To  compute  the  distance  between  these  trees,  the  user  must  provide  correct  format  information  and  the 
corresponding  cost  functions.  This  is  done  by  executing  the  command  nodeformat,  and  the  commands  unit.cost, 
constant-cost  or  user_defined_cost. 

When  the  user  clicks  the  command  nodeformat,  the  following  dialog  window  appears,  requesting  the  node 
format  information: 


Specify  the  new  node  format  (e.g.  Node_Ll). 

OK 

C2incel 

Figure  3.C:  A  pop-up  window  requesting  node  formal  information. 

The  user  types  in  the  node  format  in  the  blank  box.  Then  he  clicks  OK.  Thi.s  node  formal  will  be  used  in 
computing  tree  distances  until  the  user  clicks  the  nodeformat  command  again  and  inputs  other  node  formats. 

For  example,  in  our  sample  database,  suppose  that  the  user  first  wants  to  retrieve  trees  from  file  E.  Since 
all  trees  in  this  file  have  formal  Node_Ll,  he  notifies  the  system  of  this  format  by  executing  the  command 
nodeformat,  and  keys  in  the  format  in  the  pop-uj)  window  as  shown  below: 
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Specify  the  new  node  format  (e.g.  Node_Ll). 

NodeXl 

OK 

Cancel 

Figure  3.7:  Entering  tlie  node  format  information. 

After  a  while,  suppose  that  the  user  becomes  interested  in  trees  in  file  G  and  wishes  to  retrieve  trees 
from  that  file.  Since  all  trees  in  G  have  format  Node_L3,  he  notifies  the  system  of  this  change  by  clicking  the 
command  nodeformat  again  and  keying  in  the  node  format  in  the  pop-up  window. 

From  then  on,  trees  being  compared  must  have  the  format  Node_L3.  This  holds  until  the  user  inputs 
another  node  format. 

3.2.7      Uiiit_cost,  Constant_cost  &;  User_defined_cost 

Recall  that  the  distance  is  the  sum  of  costs  of  the  three  edit  operations  relabel,  insert  and  delete.  Specifying 
those  costs  is  done  by  executing  one  of  the  commands  unit_cost,  constant.cost,  and  user.defined-cost. 

When  the  user  clicks  the  command  unit.cost,  the  system  will  assume  a  unit  cost  for  all  edit  operations. 
When  the  user  clicks  the  command  constant_cost,  the  following  dialog  window  appears,  requesting  relabeling 
and  insert/delete  costs: 


Input  the  editing  cost: 

Insert/Delete: 

Relabel: 

OK 

Cancel 

Figure  3.8:   A  pop-up  window  requesting  cost  values. 

The  user  fills  in  the  insert  (and  delete)  cost  (e.g.,  5)  in  the  box  labeled  Insert/Delete,  and  fills  in  the 
relabeling  cost  (e.g.,  3)  in  the  box  labeled  Relabeling,  respectively.  (Note:  The  costs  for  the  edit  operations 
must  satisfy  the  triangle  inequality;  see  [27].) 

If  the  user  clicks  the  command  user_defined.cost,  the  following  dialog  window  appears,  requesting  the  name 
of  the  user-defined  cost  function: 
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Specify  a  new  cost  function  name  (e.g.  user jiefl ) 

OK 

Cancel 

Figure  3.9:  A  pop-up  window  requesting  the  cost  function  name. 

The  user  may  then  enter  either  userjdef  1  or  userjlef2  in  the  blank  box  and  clicks  OK.  (Of  course,  the 
user  should  already  define  these  cost  functions  when  executing  the  command  cost-function  (cf.  Section  3.2.3).) 

Like  the  node  format,  the  cost  value  will  be  used  until  the  user  clicks  one  of  these  commands  again  and 
inputs  other  cost  values. 


3.3      Class  2  (Create) 
3.3.1      Scratch 

When  the  user  wishes  to  build  a  pattern  from  scratch,  he  clicks  the  command  scratch.   A  dialog  window  (as 
shown  below)  pops  up,  requesting  the  node  label  and  contents  of  the  root  of  the  pattern. 


Enter  the  first  node. 

Label: 

Contents: 

OK 

Cancel 

Figure  3.10:  A  pojvup  window  requesting  new  node  information. 

After  the  user  inputs  the  requested  information,  he  clicks  OK.  A  single  node  will  appear  on  the  screen. 
The  user  may  then  edit  the  pattern  using  Class  5  commands  (see  Section  3.6). 

3.3.2     KeyJn 

This  command  is  used  when  the  user  wants  to  key  in  the  pattern  directly.  When  the  user  clicks  the  command, 
a  window  pops  up.  The  user  then  types  in  the  pattern  in  the  window  using  the  vi  editor.  After  saving  the 
file,  the  pattern  just  typed  in  will  be  displayed  on  the  screen. 

The  user  types  in  the  pattern  in  its  linear  form,  terminated  by  a  semicolon.    For  example,  suppose  the 
pattern  at  hand  is  a.s  shown  in  Figure  2.1,  he  would  type  in  the  following  tree: 
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(N  {size  2}  (I  {size  3}  (M  {size  5}  (B  {size  l)  (M  {size  0} 
(H  {size  7}  )  (H  {size  8}  )))  (H  {size  5})  )  )  )  ; 


3.4      Class  3  (Statement) 


This  class  of  commands  allow  users  to  input  statements  in  an  edit  window  using  the  text  editor.  A  statement 
must  be  terminated  by  a  semicolon.  The  statement  together  with  the  pattern  on  the  canvas  window  constitute 
a  query.  After  the  user  finishes  typing  in  the  statement  and  saves  it,  the  edit  window  disappears.  The  system 
then  starts  to  process  the  query  and  display  solution  (sub)trees  (see  Section  3.5).  If  there  are  errors  in  the 
query,  the  system  will  display  error  messages  via  a  message  window. 

3.4.1      Retrieve_tree 

To  retrieve  trees  from  the  database,  the  user  clicks  the  command  retrieve.tree.  After  the  user  clicks  this  com- 
mand, an  edit  window  as  shown  below  pops  up: 


Replace  identifier  in  angle  brackets 

No  need  to  erase  comments  that  begin  with 

retrieve  tree 

<identifier> 

tree  variable  or  tree  constant,  lower  case. 

If  boolean-expr  is  provided,  then  this  field  is  treated 

as  a  variable.  Otherwise,  it  must  be  the  name  of 

a  specific  tree  in  the  from  file. 

from  file  <id 

jnti 

fier> 

must  be  upper  case 

and  have  .TRF  as  a  suffix. 

into  file  <identil 

er> 

optional.    Upper  case, 

and  have  .TRF  as  a  suffix. 

where  <boolean 

expr> 

optional,  e.g.  dist(pa,  t)  <  10 

Comments  begin  with  — ;  these  comments  guide  the  user  to  key  in  statements.  Bcisically,  the  user  replaces 
the  terms  inside  angle  brackets  by  appropriate  specifications."  Comments  are  ignored  by  the  parser. 

3.4.2  Retrieve_subtree 

To  retrieve  subtrees  from  the  database,  the  user  executes  the  command  retrieve_subtree  After  clicking  this 
command,  the  user  sees  a  window  smiilar  to  the  one  for  the  command  retrieve.tree,  except  that  the  key  word 
tree  (on  the  third  row)  is  replaced  by  subtree. 

3.4.3  Delete_tree 

To  delete  a  tree  from  the  databcise,  the  user  executes  the  command  delete.tree  After  clicking  the  command, 
an  edit  window  as  shown  below  pops  up: 


Note  that  the  file  name  must  have  .TRF  as  a  suffix.  For  example,  E.TRF,  F.TRF,  G.TRF  are  legal  file  names. 
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Replace  identifier  in  angle  brackets 

No  need  to  erase  comments  that  begin  with 

delete 

<identifier> 

tree  name  (your  choice)  in  lower  case 

from  file  <identifier> 

must  be  upper  case 

and  have  -TRF  as  a  suffix 

The  user  may  then  type  in  the  appropriate  delete  statement. 

3.4.4     Insert_tree 

To  insert  a  tree,  the  user  executes  the  command  insert-tree.  A  window  as  shown  below  will  appear: 


Replace  identifier  in  angle  brackets 

No  need  to  erase  comments  that  begin  with 

insert 

<identifier> 

tree  name  (your  choice)  in  lower  case 

into  file  <identifier> 

must  be  upper  case 

—  and  have  .TRF  as  a  suffix 

The  user  may  then  type  in  the  appropriate  insert  statement. 

3.4.5  Previous_statenient 

After  clicking  the  command  previous-Statement,  an  edit  window  pops  up,  in  which  the  user  can  find  the 
statement  last  entered.  The  user  can  then  modify  the  statement. 

3.4.6  Unix_shell 

This  command  allows  the  user  to  enter  the  unix  environment.  After  clicking  the  command,  a  window  pops 
up,  in  which  the  user  sees  the  shell  prompt.  To  get  out  of  the  shell,  simply  type  exit. 

3.4.7  List 

After  executing  the  command  list,  a  window  pops  up,  in  which  the  user  can  see  a  list  of  data  files  (along  with 
trees  stored  in  those  files)  in  the  current  directory. 

3.5      Class  4  (Display) 

This  class  consists  of  commands  for  displaying  solution  (sub)trees  and  the  majiping  mformation.  When  trees 
are  displayed,  they  will  be  displayed  one  at  a  time.  When  subtrees  are  displayed,  they  may  be  displayed  on  a 
tree  basis,  i.e.,  the  entire  tree  is  displayed  with  corresponding  subtrees  highlighted,  or  they  may  be  displayed 
one  at  a  time.'  The  user  can  switch  between  the  two  display  modes  by  executing  the  tree/subtree  command 
(see  below). 


'All  solution  subtrees  are  displayed  eventually. 
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3.5.1      Tree/Subtree 

While  displaying  subtrees  on  a  tree  basis,  if  the  user  clicks  the  tree/subtree  command,  a  message  window  will 
appear,  asking  him  to  locate  the  root  of  the  desired  subtree.  The  user  then  moves  the  cursor  to  the  appropriate 
node  and  clicks  the  mouse;  the  corresponding  subtree  will  be  displayed  on  the  screen.  (If  the  user  does  not 
select  an  appropriate  node,  there  won't  be  any  action.)  The  user  can  then  look  at  each  solution  subtree  in 
turn  by  executing  the  next.match  command  (see  Section  3.5.2).  Suppose  that,  at  certain  point,  the  user  wants 
to  see  subtrees  displayed  on  a  tree  basis  again,  he  clicks  the  tree/subtree  command  again. 

For  example,  the  following  illustrates  a  screen  layout,  in  which  the  left  tree  is  the  pattern  and  the  darkened 
nodes  in  the  right  tree  represent  solution  subtrees.  (Thus,  the  subtrees  are  displayed  on  a  tree  basis.) 


Figure  3.11:  A  screen  layout. 

We  consider  two  cases: 

Case  1;  Suppose  that  the  user  wishes  to  have  each  individual  solution  subtree  displayed  on  the  screen.  He 
clicks  the  command  tree/subtree.  The  following  message  will  appear: 


Click  on  the  root  of  the  subtree. 
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The  user  may  then  click  node,  say  c;  the  screen  becomes  the  following: 


Figure  3.12:  A  screen  layout. 

From  now  on,  the  system  will  display  solution  subtrees  one  at  a  time.*°  Now  suppose  the  user  wishes  to 
use  this  subtree  as  a  new  pattern.  He  can  delete  the  old  pattern  using  the  commands  in  Class  5  (see  Section 
3.6).  The  screen  will  look  as  follows: 


■'To  set  these  sublrffs.  the  user  executes  the  next  jnatch  command;  see  Section  3.5.2. 
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Figure  3.13:  A  screen  layout. 


Case  2:  Suppose  that  the  user  wishes  to  use  the  entire  data  tree  shown  in  Figure  3.11  as  a  new  pattern. 
As  in  the  previous  case,  he  can  delete  the  old  pattern  using  the  commands  in  Class  5  and  the  screen  will  look 
as  follows: 
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Figure  3.14:  A  screen  layout. 
The  user  may  then  edit  the  new  pattern  tree  and  input  a  new  statement. 

3.5.2      Next_match 

Often,  there  will  be  a  set  of  (sub)trees  satisfying  a  query.  This  command  enables  the  user  to  look  at  solution 
(sub)trees  one  by  one.  The  (sub)trees  are  displayed  in  a  cyclic  fashion,  i.e.,  the  system  comes  back  to  the  first 
(sub)tree  if  all  the  (sub)trees  have  been  displayed." 

•   When  displaying  trees,  executing  the  command  next.match  will  make  the  system  show  the  next  available 
solution  tree. 


When  displaying  subtrees, 

-  if  the  system  is  displaying  them  on  a  tree  basis,  executing  the  command  next.match  will  render  the 
next  tree  shown  on  the  canvas  window. 

-  if  the  system  is  displaying  them  one  at  a  time,  executing  the  command  next-match  will  render  the 
next  subtree  shown  on  the  canvas  window. 


"The  order  in  which  solution  trees  are  displayed  is  consistent  with  that  in  which  they  are  stored  in  the  file  (cf.   Section  3.1). 
The  order  in  which  subtrees  are  displayed  is  based  on  their  postordering  in  a  tree. 
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For  example,  suppose  that  the  current  screen  is  as  shown  in  Figure  3.11.     Executing  the  command 
next_match  results  in  the  following  screen  (here,,  we  see  the  next  data  tree  on  the  screen). 


Figure  3.15:  A  screen  layout. 


On  the  other  hand,  suppose  that  the  current  screen  is  as  shown  in  Figure  3.12.  Executing  the  command 
next. match  results  in  the  following  screen.  (Here,  we  see  the  next  solution  subtree,  which  is  the  one  rooted  at 
node  e  (cf.  Figure  3.11).) 


CHAPTER  3.    ATBE  INTERFACE  COMMANDS 


34 


Figure  3.16:  A  screen  layout. 


3.5.3      Mark 


When  a  solution  tree  is  large,  the  user  may  wish  to  highlight  a  certain  part  of  the  tree  and  then  zoom  in  to 
see  more  detail.  After  the  user  clicks  this  command,  he  sees  the  following  message  window: 


Locate  and  click  the  two  diagonally  opposite  comers. 


The  user  may  then  move  the  mouse  to  mark  the  portion  he  wants  to  look  at.  (This  is  done  by  first  clicking 
the  mouse,  moving  the  mouse  to  mark  the  area,  and  then  clicking  the  mouse  again  to  fix  the  area.)  Notice 
that  this  command  only  marks  the  area.  The  command  focus  (see  below)  actually  zooms  into  the  marked 
area. 


3.5.4      Focus 

After  the  user  marks  the  area  he  wants  to  look  at,  he  clicks  the  command  focus.   The  marked  area  will  be 
enlarged  and  will  occupy  the  entire  canvas  window.   The  commands  mark  and  focus  can  be  applied  to  both 
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solution  (sub)tree  and  the  pattern.  They  can  be  applied  recursively,  i.e.,  a  focused  part  can  be  further  focused, 
and  so  on. 

3.5.5  Unfocus_all 

This  command  undoes  the  effect  of  focus.  After  executing  the  command,  the  original  tree  will  be  displayed 
on  the  screen. 

3.5.6  Map_withcut 

This  command  displays  the  mapping  information  between  the  pattern  and  the  data  tree,  allowing  zero  or  more 
cuttings  at  nodes  from  the  data  tree  (cf.  Section  1.2.3).  The  cut  nodes  are  darkened  in  the  data  tree.  Figure 
3. IT  shows  a  result  of  the  command  map.withcut. 


Figure  3.17:  A  screen  layout. 


3.5.7      Map_withprune 

Tills  command  displays  the  mapping  information  between  the  pattern  and  the  data  tree,  allowing  zero  or  more 
prunings  at  nodes  from  the  data  tree  (cf.  Section  1.2.3).  Like  cut  nodes,  the  pruned  nodes  are  darkened  in 
the  data  tree.   Figure  3.18  shows  a  result  of  the  command  map.withprune. 
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Figure  3.18:  A  screen  layout. 


3.5.8      Map.only 

This  command  displays  the  best  mapping  yielding  the  distance  between  the  pattern  and  the  data  tree.  Figure 
3.19  shows  a  result  of  the  command  map.only. 
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Figure  3.19:  A  screen  layout. 


3.5.9     Instantiate 


This  command  enables  the  user  to  see  whicii  subtrees  (nodes)  in  the  data  tree  are  used  to  instantiate  variables.'^ 
The  root  of  the  subtree  (node)  instantiatmg  a  variable  is  connected  to  the  variable  by  a  dashed  line,  as  shown 
below. 


Figure  3.20:   Example  showing  how  variables  are  instantiated. 


3.5.10     Erase_mapinfo 

This  command  erases  the  mapping  between  trees. 

'^The  conimands  for  displaying  ihe  mapping  information  (i.e.,  map.withcut.  map.withprune.  map.only)  apply  to  mark-free 
queries  only.  The  current  ATBE  implementation  is  unable  to  display  the  mapping  information  between  the  pattern  and  solution 
trees  for  queries  having  marks  {i.e.,  variables,  stars,  or  question  marks),  except  for  showing  nodes  used  to  instantiate  variables. 
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3.5.11  Name 

When  the  user  clicks  this  command,  a  message  window  pops  up,  showing  the  name  of  the  solution  (sub)tree 
currently  displayed  on  the  canvas  window  (we  assume  that  each  tree  has  a  unique  name). 

3.5.12  Distance 

When  the  user  clicks  this  command,  a  message  window  pops  up,  showing  the  name  of  the  solution  (sub)tree 
currently  displayed  on  the  canvas  window,  along  with  the  distance  between  the  solution  (sub)tree  and  the 
pattern. 

3.5.13  Form_toggle 

The  system  displays  trees  (patterns  or  solution  trees)  either  in  vertical  normal  form  or  in  horizontal  normal 
form.  (The  default  is  the  vertical  normal  form).  This  command  has  toggle  effects,  allowing  the  user  to  switch 
from  one  form  to  the  other. 

3.6      Class  5  (Edit) 

Commands  in  this  class  help  the  user  edit  the  tree  in  the  window.  This  includes  inserting,  deleting,  removing, 
copying  subtrees  (nodes),  modifying  node  labels  and  contents,  attaching  and  detaching  stars  or  question  marks. 
All  commands  belonging  to  this  class  require  the  user  to  select  a  node  first.  This  is  done  by  moving  the 
cursor  to  the  node  and  then  clicking  the  mouse.  (In  the  remaining,  we  shall  refer  to  the  clicked  node  as  N.) 
After  selecting  a  node,  the  user  can  then  edit  the  tree. 

3.6.1  Add_as_a_parent 

This  command  enables  the  user  to  insert  a  node  between  N  and  its  parent.  (If  N  is  the  root  of  the  tree,  then 
the  newly  inserted  node  becomes  the  new  root.) 

After  clicking  the  command,  a  dialog  window  as  shown  in  Figure  3.10  appears,  requesting  the  node  infor- 
mation. After  the  user  inputs  the  node  label  and  contents  in  the  designated  boxes,  the  node  is  inserted  into 
its  appropriate  place. 

3.6.2  Add_as_a_child 

This  command  enables  the  user  to  insert  a  node  as  the  child  of  N.  There  are  two  cases; 

Case  1:  N  does  not  have  a  child  (i.e.,  N  is  a  leaf).  In  this  case,  after  the  u.ser  clicks  the  command,  he  sees 
a  dialog  window  as  shown  in  Figure  3.10,  which  requests  the  new  node  information. 

Case  2;  N  has  children.  For  this  case,  after  the  user  clicks  the  command,  he  sees  the  following  message: 


Locate  tlie  place  where  the  new  node  is  inserted. 
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The  user  then  clicks  the  place  where  he  wants  to  insert  the  new  node.'^  A  dialog  window  as  shown  in 
Figure  3.10  will  appear,  requesting  the  node  information.  After  the  user  fills  in  the  information,  the  node  will 
be  inserted  into  where  he  just  clicked. 

3.6.3  Attach_a_star 

After  clicking  this  command,  a  star  will  be  added  on  the  edge  between  N  and  its  parent. 

3.6.4  Detach_a_star 

This  command  is  the  inverse  of  the  command  attach-a_star.  It  detaches  the  star  between  N  and  its  parent. 

3.6.5  Attach_a_questionmark 

This  command  enables  the  user  to  place  a  question  mark  on  the  edge  between  N  and  its  parent. 

3.6.6  Detach_a_questionmark 

This  command  is  the  inverse  of  the  command  attach_a_questionmark.   It  detaches  the  question  mark  between 
N  and  its  parent. 

3.6.7  Delete_a_node 

This  command  deletes  a  node.  The  deleted  node  is  stored  in  the  buffer  and  can  be  retrieved  using  the  paste 
command. 

3.6.8  Cut_a_node 

This  command  allows  the  user  to  cut  the  subtree  rooted  at  some  node  (cf.  Section  1.2.3).  The  cut  portion  of 
the  tree  is  stored  in  the  buffer  and  can  be  retrieved  by  the  paste  command. 

3.6.9  Prune_a_node 

This  command  allows  the  user  to  prune  the  descendants  of  some  node  (cf.  Section  1.2.3).  The  pruned  part  is 
stored  in  the  buffer  and  can  be  retrieved  by  the  paste  command. 

3.6.10  Modify_a_node 

This  command  enables  the  user  to  modify  the  label  or  contents  of  a  node.    After  clicking  the  command,  the 
following  dialog  window  appears: 


'■'For  example,  if  he  wants  to  insert  the  node  between  two  nodes,  he  moves  the  cursor  to  somewhere  between  them,  and  clicks 
the  mouse.  If  he  wants  to  insert  the  node  as  the  rightmost  (leftmost,  respectively)  child  of  N,  he  moves  the  cursor  to  the  right 
(left,  respectively)  of  the  current  rightmost  (leftmost,  respectively)  child  and  clicks  the  mouse. 
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M 

adify  the  node. 

Label:  X 

Contents:  Y 

OK 

Cancel 

X  and  Y  represent  the  node's  current  label  and  contents.  The  user  may  modify  them,  giving  N  a  new  label 
or  contents. 

3.6.11  Copy_a_iiode 

This  command  enables  the  user  to  copy  a  node  to  another  place.  The  clicked  node  N  is  copied  into  the  buffer. 
Actual  copy  action  will  be  done  by  the  command  paste. 

3.6.12  Copy_a_subtree 

The  command  enables  the  user  to  copy  a  subtree  to  another  place.  The  subtree  rooted  at  N  is  copied  into  the 
buffer.  Actual  copy  action  will  be  done  by  the  command  paste. 

3.6.13  Paste 

This  command  inserts  (pastes)  materials  from  the  buffer  into  the  child  position  of  the  selected  node. 

3.6.14  UndoJast 

This  conmiand  undoes  the  effect  made  by  the  last  editing  command. 

3.6.15  Undo_all 

Users  may  apply  more  than  one  edit  command  to  a  tree  (which  we  refer  to  as  the  initial  tree).  The  command 
undo. all  undoes  the  effect  made  by  ail  the  edit  operations.  Thus,  after  executing  this  command,  the  initial 
tree  is  displayed. 


Chapter  4 

Installing  and  Constructing  a  Custom 
System 


Having  described  the  ATBE  query  and  interface  commands,  we  now  describe  how  to  install  it  into  a  BSD 
UNIX  system  and  how  to  customize  the  system. 

4.1      File  Structures 

Because  all  code  of  ATBE  is  written  in  C  and  X-windows,  it  is  portable  to  any  UNIX  system  supporting 
X-windows.  Currently,  alpha  release  1.0  is  stored  in  a  tape,  written  in  /ar  format  at  1600  bpi.  The  source 
code,  command  files,  as  well  as  manual  are  distributed  into  various  directories,  as  shown  in  Figure  4.1. 


Figure  4.1:  File  structures  of  ATBE. 
ATBE  is  tiie  main  directory,  under  which  there  are  si.x  subdirectories  (or  files). 

•  ATBE/Makefile:    This  is  a  command  file.     Executing  the  command  make  produces  the  executable 
program  atbe  in  the  ATBE/bin  directory. 

•  ATBE/atbesrc:  This  subdirectory  contains  llie  source  code  for  the  graphical  interface,  as  well  as  the 
parser  and  query  optimizer. 

•  ATBE/bin:  This  subdirectory  contains  command  files  and  the  executable  program  atbe. 

•  ATBE/codegensrc:  This  subdirectory  contains  the  source  code  for  the  program  generator,  which  is 
used  for  customizing  the  system  (see  Section  4.3). 
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•  ATBE/doc:  This  subdirectory  contains  the  manual  (atbemanu.tex)  in  latex  format. 

•  ATBE/treecomsrc:  This  subdirectory  contains  the  source  code  for  the  tree  comparison  algorithms. 

4.2      Installing  the  ATBE  System 

To  install  the  system,  proceed  as  follows: 

1.  Selecting  installation  location:  Decide  where  the  ATBE  directory  is  to  be  located.  We  assume  below 
that  loc/ATBE  is  the  full  path  name  of  that  directory. 

2.  Install  the  ATBE  directory:  Change  your  working  directory  to  loc,  mount  the  tape,  and  execute  the 
command  tar  xv. 

3.  Making  both  the  operating  system  and  ATBE  be  aware  of  the  TREEtool  directory:  Modify  your 
.login  file,  adding  the  full  path  name  loc/ATBE/bin  to  variable  PATH,  and  the  command  setenv 
ATBE  loc/ATBE  into  the  file. 

4.  Installing  the  system:  Change  your  working  directory  to  loc/ATBE,  and  type  the  command  make. 

Now  all  the  files  in  the  tape  will  be  in  the  loc/ATBE  directory,  and  an  executable  program  atbe  is  produced 
in  the  loc/ATBE/bin  directory.  You  may  then  go  to  any  directory  you  like  and  type  the  command  atbe  in 
that  directory.  This  results  in  the  following  major  ATBE  windows  shown  on  the  screen. 
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Figure  4.2:  The  major  ATBE  window. 


4.3      Customizing  the  ATBE  System 


To  get  a  custom  system,  you  need  to  first  define  node  formats  for  trees  of  interest.    To  input  these  node 
definitions,  rlick  the  command  customize;  the  following  pull-down  menu  will  appear  on  the  screen: 
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nodedef 


codegen 


cost  Junction 


list  jiodeformats 


constant-cost 


userjdefinedxost 


Figure  4.3:  Customize  menu. 


You  then  click  nodedef  on  the  menu.  An  edit  window  pops  up,  requesting  node  definitions.  Since  trees  of 
interest  have  formats  NodeXl,  NodeX2,  and  Node_L3  (cf.  Section  3.1),  key  in  these  definitions  (eis  shown  in 
Figure  1.2)  into  the  window. 

Next,  execute  the  command  codegen.  As  mentioned  earlier,  this  will  produce  a  collection  of  I/O  programs 
(to  read  trees  of  the  above  formats)  and  cost  function  programs  specific  to  the  formats.  If  you  want  to  have 
your  own  cost  functions,  execute  the  command  cost-function  and  input  your  cost  functions  to  the  system  (cf. 
Section  3.2.3).  Finally,  execute  the  command  setuptree  to  produce  a  custom  system.  Figure  4.4  illustrates  the 
procedure,  showing  how  relevant  source  programs  are  linked  when  customizing  the  system.  The  generated 
system  is  able  to  handle  queries  that  access  trees  with  the  above  formats. 
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Node  formats 
(definitions) 


Codegen 


progreim 
generator 


I/O  i:  cost 
function  programs 


tree 
comparison 

algorithms 


compile  and  link 


Custom 

tree 

tool 


Figure  4.4:  Diagram  showing  how  the  ATBE  system  is  customized. 


Chapter  5 

Example  of  an  ATBE  Session 


In  this  section,  we  present  an  annotated  example  of  an  ATBE  session.  We  assume  that  the  three  files  of  trees 
listed  in  Section  3.1  already  exist  in  the  database. 

Suppose  you  wish  to  create  a  new  file  H  and  insert  the  following  two  trees  into  it: 

t9    (a   (d    (e)    (f))    (g    (h)    (i))); 
tlO    (c    (h)    (i)    (j)    (k)); 

Since  the  trees  have  formal  Node_Ll,  click  the  command  nodeformat  in  the  custom  menu  and  key  in 
Node_Ll  in  the  pop-up  window  (cf.  Section  3.2.6).  Assume  that  all  edit  operations  have  unit  cost.  Also  click 
the  command  unit.cost.  Then  input  the  trees  (either  by  keying  in  or  editing)  and  type  in  corresponding  insert 
statements.  The  result  of  the  second  insert  statement  is  to  display  tree  tlO  as  follows: 
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Figure  5.1:  A  screen  layout. 


Suppose  you  then  want  to  retrieve  trees  in  file  H  that  are  within  distance  8  of  tree  tl  in  file  E.  First  retrieve 
tl  from  E  (and  erase  tlO  so  that  tl  is  the  only  tree  left  on  the  screen).  The  screen  looks  as  follows: 
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Figure  5.2:  A  screen  layout. 


Then  issue  the  statement 


retrieve  tree  t   from  file  H  ohere  distCpa,   t)    <    8; 

The  result  of  this  statement  is  as  follows: 
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Figure  5.3:  A  screen  layout. 


The  left  tree  is  the  pattern;  the  right  tree  is  the  first  solution  tree. 

Clicking  the  command  next. match  lets  you  see  the  next  solution  tree  (which  is  tlO  here),  cis  shown  below. 
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Figure  5.4:  A  screen  layout. 
Next,  suppose  that  you  wish  to  delete  tree  t3  from  file  E,  simply  type 

delete   t3   from  file  E; 

So  far.  trees  you  have  accessed  have  format  Node_Ll.    Now  suppose  that  you  wish  to  retrieve  subtrees  t 
from  file  F  that  are  within  distance  5  of  the  following  pattern,  allowing  zero  or  more  prunings  at  nodes  from  t: 

(e    {size   3}    (b   {size   4})    (c    {size   7))) 

Since  both  the  pattern  and  trees  to  be  retrieved  have  format  Node_L2,  you  need  to  execute  the  command 
nodeformat  to  notify  the  system  of  the  change  of  the  node  format  (cf.  Section  3.2.6).  Then  erase  whatever  is 
left  on  the  screen  and  input  the  new  pattern  and  the  following  statement: 
retrieve  subtree  t   from  file  F  ohere  distoithpruneCpa,   t)    <  5; 

Tlie  result  of  this  query  will  look  as  follows; 
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Figure  5.5:  A  screen  layout. 


The  left  tree  is  the  pattern;  the  right  tree  is  t4,  which  is  the  first  tree  in  file  F.  The  darkened  nodes  on  t4 
represent  solution  subtrees;  they  are  displayed  on  a  tree  basis  (cf.  Section  3.5.1). 

To  see  the  mapping  between  the  pattern  and,  say  the  subtree  rooted  at  node  a,  click  the  command 
tree/subtree  and  node  a.  Then  click  the  command  map.withprune.   You  will  see  the  following: 
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Figure  5.6:  A  screen  layout. 


The  pruned  nodes  are  darkened.  To  see  the  distance  between  them,  click  the  command  distance. 

The  cost  function  used  thus  far  has  been  unit  cost.  Now,  suppose  you  wish  to  use  constant  costs,  with 
insert/delete  cost  bemg  5  and  relabeling  cost  being  3.  Further,  suppose  you  wish  to  retrieve  trees  /  from  file 
G  that  exactly  match  the  pattern  (a  {csize  5}  (c  {csize  3})  (_x)),  allowing  zero  or  more  cuttings  at 
nodes  from  t.  Since  both  the  node  format  and  cost  functions  are  changed,  you  need  to  execute  the  command 
nodeformat  and  constant.cost  to  notify  the  system  of  these  changes  (cf.  Section  3.2).  Then  input  the  pattern 
and  issue  the  statement 

retrieve  tree  t   from  file  G  ohere  distBithcut(pa,   t)   =  0; 

The  result  of  this  query  is  as  follows: 
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Figure  5.7:  A  screen  layout. 


The  left  tree  is  the  pattern  and  the  right  tree  is  the  first  solution  tree.    To  see  which  node  (subtree)  is 
matched  with  .j,  click  the  command  instantiate.  The  screen  will  look  as  follows: 
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Figure  5.8:  A  screen  layout. 
To  conclude  this  session,  click  the  command  exit.  This  will  take  you  back  to  the  UNIX  environment. 
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